A systematic review of validity of US survey measures for assessing substance use and substance use disorders

Background The steep rise in substance use and substance use disorder (SUD) shows an urgency to assess its prevalence using valid measures. This systematic review summarizes the validity of measures to assess the prevalence of substance use and SUD in the US estimated in population and sub-population-based surveys. Methods A literature search was performed using nine online databases. Studies were included in the review if they were published in English and tested the validity of substance use and SUD measures among US adults at the general or sub-population level. Independent reviews were conducted by the authors to complete data synthesis and assess the risk of bias. Results Overall, 46 studies validating substance use/SUD (n = 46) measures were included in this review, in which 63% were conducted in clinical settings and 89% assessed the validity of SUD measures. Among the studies that assessed SUD screening measures, 78% examined a generic SUD measure, and the rest screened for specific disorders. Almost every study used a different survey measure. Overall, sensitivity and specificity tests were conducted in over a third of the studies for validation, and 10 studies used receiver operating characteristics curve. Conclusion Findings suggest a lack of standardized methods in surveys measuring and reporting prevalence of substance use/SUD among US adults. It highlights a critical need to develop short measures for assessing SUD that do not require lengthy, time-consuming data collection that would be difficult to incorporate into population-based surveys assessing a multitude of health dimensions. Systematic review registration PROSPERO CRD42022298280. Supplementary Information The online version contains supplementary material available at 10.1186/s13643-024-02536-x.


Introduction
Substance use remains a serious adverse health risk in the United States (US).Forty million Americans reported illicit drug use in the past month in 2021, among people aged 12 years or older (Substance Abuse and Mental Health Services Administration, 2022b), with over 106,000 people in the US fatally overdosing in 2021 (National Institute on Drug Abuse, 2023).This is a dramatic increase of approximately 15% in overdoses within 1 year, signifying critical, life-threatening substance use problems and an associated overdose epidemic throughout the county.Notably, substance use problems that met the criteria for a substance use disorder (SUD) were reported by a sizeable proportion of the US population.More than 46 million people aged 12 years or older met the Diagnostic Statistical Manual of Mental Disorders (DSM-V) criteria for SUD in the past year, according to the National Survey of Drug Use and Health (NSDUH), with the highest percentage of people with SUD being young adults aged 18-25 (25.6%), followed by adults aged 26 or older (16.1%) [1].Unfortunately, population-based assessments for SUD are rare beyond the NSDUH, especially at substate levels, although imperative to inform appropriate resource allocation and population-based interventions for states responding to the SUD and overdose epidemics.
There are few population-based surveys conducted in the US that assess substance use and/or SUD.NSDUH is a good example of a survey that monitors annual national trends in substance use and mental health issues in the US and provides estimates of the need for substance use prevention and treatment programs [2].However, it involves lengthy questions and branching logic that are not feasible for use in surveys covering multiple health domains.Another validated tool to assess SUD is the National Addictions Vigilance Intervention and Prevention Program (NAVIPPRO ™ ) Addiction Severity Index-Multimedia Version ® (ASI-MV ® ) [3].However, results from this measure may not be generalizable because it is only used to evaluate those already seeking SUD treatment.In addition, selection bias is likely because the participants are selected based on convenience sampling among treatment centers [4].Other measures that have been validated for assessing substance use in the US are Drug Abuse Screening Test (DAST) [5], Alcohol, Smoking, and Substance involvement Screening Test (ASSIST), and tobacco, alcohol, prescription medication, and other substance use (TAPS) [6].However, these survey measures also require multiple, lengthy questions to estimate the prevalence of SUD.
Validated substance use and SUD measures that are shorter and more versatile are needed to ease the incorporation of these measures into more multidimensional population health surveys to better assess and respond to the current US substance use and overdose epidemics.Much work has been done on validating alcohol and tobacco measures, such as ASSIST and TAPS [6].We know of no review of validation research conducted on other substance use and/or SUD measures among the US population, although previous studies provide valuable insights into measures assessing the efficacy of substance use measures and interventions [7] and addressing psychometric properties of screening tools among specific settings or populations [8].Thus, the purpose of this review is to comprehensively summarize published literature investigating the validity of substance use and SUD measures, other than alcohol and tobacco use, in US surveys to advance the use of these validated measures on more population-based surveys.

Search strategy
This systematic review has followed the Preferred Reporting Items for Systematic reviews and Meta-analyses (PRISMA) guidelines [9] and was registered through PROSPERO (CRD42022298280).Potential eligible studies were identified by using the following nine electronic databases, starting from their inception up to November 22, 2021: PubMed, Scopus, CINAHL, PsycINFO, Academic Search Complete, Web of Science, ProQuest Theses and Dissertation Global, and Google Scholar.Primary keywords and phrases used for searching included "healthcare survey, " "mental health, " "substance use, " and "validity." Detailed search strategies corresponding to the specific databases are shown in Supplementary Table 1.
The following study inclusion criteria were established a priori for use in this systematic review: [1] Utilized existing surveys or questionnaires at the county level or higher (validation may have been done at a sub-population level) or at clinical settings in the US; [2] to ensure the reviewed measures are applicable to US populations, and only studies conducted in the US were included in this review; [3] validity/validation testing conducted for measures of mental health and/or substance use; [4] study sample consisted of adults 18 years of age or older; [5] studies published in English language; and [6] peerreviewed, published studies, official reports from surveys, and doctoral dissertations.In addition, exclusion criteria were applied to those studies that [1] assessed the validity of measures unrelated to mental health/substance use (i.e., physical activity, chronic disease, infectious disease); [2] assessed the validity of alcohol and/or tobacco measures only; [3] were published as abstract only or did not have full texts available; [4] were protocols, editorials, reviews, or commentary; [5] validated language translation or cultural version of an instrument; and [6] were conducted internationally.In order to better align with the aims of this review, studies validating only alcohol/ tobacco use measures were excluded because they have been widely studied in previous literature [10][11][12][13].

Quality assessment
An adapted risk-of-bias tool was developed for the purpose of this systematic review to assess the validity of substance use and mental health survey instruments.This methodological quality assessment tool was adapted from a previously published tool which evaluated the rigor of validity testing in the Behavioral Risk Factor Surveillance System (BRFSS) literature [14].The new risk-of-bias tool was used to assess the quality of the [1] methodology and [2] statistical analyses of studies included in the systematic review.The methodological component was scored from 0 to 3 (3 = studies utilizing a physical measurement(s) as a comparator during validity testing, which were considered to be the "gold standard, " 2 = studies using measures other than actual physical measure, 1 = studies that conducted face validity based on the researcher's judgment or a collective judgment, 0 = studies that did not report on the measurement used for validity testing).The statistical analysis component was scored from 0 to 2 (2 = using statistical analyses such as sensitivity and specificity, correlation coefficient, or mean difference, 1 = reporting prevalence estimates only, 0 = no information on statistical analysis was reported).The methodological and statistical component scores were then totaled for an overall quality assessment score.Total scores ranged from 0 to 5, with 5 demonstrating the highest quality.

Data synthesis
All identified studies were imported to an EndNote library.After removing duplicates, the initial title and abstract screenings were conducted independently by three reviewers (Y.T., N. W., E. O.) using the pre-established inclusion and exclusion criteria.It was followed by the full-text review conducted independently by three reviewers (Y.T., E. C., R. M.) for the first 10% of the included studies.They then convened to review their selections to ensure agreement and refine criteria.Interrater reliability was calculated in STATA [15] using the Gwet's AC to ensure agreement [16].The remaining 90% of the selected articles were then split between the three reviewers for full-text review.Articles where a reviewer was not sure if they should be included or excluded were discussed among the three reviewers and decided by the senior author for final selection.
A data extraction form was created in Microsoft Excel to facilitate data extraction and synthesis.The form could capture up to 46 variables for each study.These variables were grouped into four main categories: study characteristics (authors, reference, year of publication, and name of journal), measure characteristics (whether the measure was used for disorder screening, the SUD being assessed by the measure, response rate, study duration, items measured, recall period, and recruitment procedure), participant characteristics (overall health status, age, sex, race, income, education), and validation methods (type of validation, statistical analysis, comparison measure, and key results).Additionally, a single article could be considered as multiple studies if it validated measures among multiple study populations.Articles that validated multiple survey measures among the same study population were considered to be one study.We evaluated the different types of validity using pre-established definitions to standardize the understanding of validity among reviewers.Our focus was on examining criterion validity (including concurrent, predictive, and content validity) and construct validity (encompassing convergent, discriminant, and factorial validity).Specifically, criterion validity was examined through comparisons with "gold standard" measures where available or through the use of clinically established diagnostic criteria and outcomes.Face validity was determined if the article could demonstrate the extent to which a substance use measured what it intended to measure.Lastly, construct validity was assessed through statistical analyses examining the correlation between survey measures and related constructs, thus ensuring that measures accurately reflect the theoretical components of substance use and SUDs.Articles that did not specify the validation methods were discussed among the three reviewers and decided by the senior author for consensus if discrepancies existed.
All data were coded independently by two reviewers (Y.T., E. C.).After extracting data from the first 10 articles, the two reviewers met to discuss any discrepancies among coding strategies.Disagreements were brought to the senior author (R. B.) for conflict resolution.Although the inclusion and exclusion criteria were determined a priori, the completion of data extraction demonstrated unique differences present between mental health and substance use studies that evaluated the psychometric properties of their respective measures.As the study developed, the results gathered from the data synthesis for substance use were substantially different from mental health assessment, and the authors determined that these separate domains would be better discussed in two separate manuscripts.Thus, the results presented in this study are from studies that validated substance use measures identified in our search.

Study characteristics
A total of 6950 results were initially obtained from the search.An additional 153 articles were identified by reviewing BRFSS reference lists [17].A flow diagram documenting the search process and reasons for excluding studies is shown in Fig. 1.Of the 7103 articles, 2339 were duplicates and were excluded before the abstract/ title review.After reviewing 4764 abstracts/titles, 3744 articles were excluded.Of the 1020 articles, a full-text review of the first 10% of articles demonstrated an almost perfect inter-rater reliability agreement between reviewers on which articles met the inclusion criteria (Gwet's AC: 0.8517 (0.8000-1.0000)).Following review of the full article text, 899 articles were removed.The key reasons for excluding the articles were because they [1] did not Fig. 1 Flow chart for the selection of studies*.*Studies could have been excluded for multiple reasons conduct validity testing (n = 874), [2] were conducted outside the United States (n = 1105), or [3] were focused on topics other than substance use (n = 878).For this review, a total of 46 articles met the inclusion criteria (Fig. 1).The characteristics of those 46 selected studies are presented in Table 1.
All 44 studies included in this review reported the final sample size, with a mean of 1427 (median = 449) participants with an overall range of 23-10,167 participants.Only 13 studies reported response rate, and the response rates ranged between 13.4 [18] and 100% [32].Twenty-six studies reported the survey duration, and it ranged from 1 month [32] to 120 months [20], with mean 28.48 months (median 13 months).Moreover, studies reported the mean age of the participants as < 30 years (n = 4), between 30 and 39 years (n = 16), and ≥ 40 years (n = 18).Another eight studies reported age groups or median age of the study population.Additionally, a majority (n = 37) of the studies were conducted in non-population-based clinical settings (e.g., inpatient, outpatient).

Participant recruitment strategy
The participant recruitment strategies from included studies in this review were shown in Table 2. Of the 46 studies, only 4% (n = 2) examined SUD in the general population [20,28]; the rest (n = 44) of the studies were conducted in clinical or other population subgroups.In the first population-based study, 6664 adult Medicaid enrollees were recruited from 1 of 7 Florida regions who took part in the Florida Health Services Survey at least once between 1998 and 2008 [20].Researchers assessed the internal psychometric properties of the Simple Screening Instrument for Substance Abuse (SSI-SA) but did not compare survey responses with SUD diagnoses in Medicaid clinical records.In the second population-based study, participants were selected from the National Epidemiologic Survey on Alcohol and Related Conditions-III (NESARC-III) sample, which included noninstitutionalized US adult residents (aged 18 years or older) [28].The authors then selected 777 respondents for the procedural validity study and used a test-retest design to compare concordance of respondents' answers to the NESARC-III survey questions with a semi-structured interview, the Psychiatric Research Interview for Substance and Mental Disorders, DSM-5 version (PRISM-5), administered by a clinician.

Quality of studies
Risk of bias was assessed based upon the methodology used for instrument comparison and the statistical analysis conducted.Although several studies adopted recruitment strategies that limited their study population to specific groups (for example, only recruiting male or white populations), the risk-of-bias assessment employed by the current study did not account for recruitment.As a result, most of the included studies (n = 41) had a riskof-bias score of 4 or higher (Table 1).Two studies had a score of 3 [30,34], one studies had a score of 2 [29,48], and two studies had a score of only 1 [27,44].Among those studies with low-quality assessment scores, four studies lacked statistical comparisons and reported prevalence estimates only [27,30,34,44].There were three studies that did not report on validation methodology [27,44,48].

Survey measure
Among the articles included in this review, 89% (n = 41) used measures specifically designed for screening SUDs.For example, seven studies tested the validity   of the measure's ability to screen for a specific SUD, including marijuana use [18,21,29,40,45], cocaine use [30], and opioid use [56].Five studies validated measures for both substance use and mental health [23,25,40,44,54], of which one study used a measure for posttraumatic stress disorder (PTSD) screening [54].The rest of the included studies did not specify a specific SUD for screening purposes but used a generic term for defining SUD.All measures and their frequency of use in the included studies are depicted in Fig. 2. The majority of studies validated one single measure, of which five studies validated the Addiction Severity Index (ASI), [32,35,41,58,59] and one study validated drug use subscales of ASI [26,37].Five studies validated multiple survey measures: Two studies conducted survey measure validation in different study populations.One study conducted a preliminary exploration of the psychometric properties of the Substance Use Risk Profile Scale (SURPS) in 3 different populations: 195 undergraduate drinkers, 390 undergraduate students from Stony Brook University, and 4234 high school students in Canada [57].In the second study, data were collected from two separate adult clinical samples -seriously mentally ill inpatients and patients presenting for evaluation at a chemical dependence program -to describe the rationale and test validity and reliability of the Chemical Use, Abuse, and Dependence Scale (CUAD) [36].

Comparison measures for validation
Several different types of measures were used as comparison for the purpose of validation.Higher quality comparison measures included items such as medical records, diagnoses, medical test results, or other SUD severity scales.A total of 10 studies conducted validity testing using at least one of these higher-quality comparison measures.Of these, three studies conducted criterion validity testing by comparing the following: (1) positive and negative 4P's Plus screens with positive and negative clinical assessment [22], (2) the Alcohol Use Disorder and Associated Disabilities Interview Schedule (AUDADIS) with psychiatrist diagnosis [28], and (3) Dartmouth Assessment of Lifestyle Instrument (DALI) with clinician diagnosis [43].The remaining two studies conducted validity testing by  • Transfer facility (n = 1) • Jail release program (n = 1) • NESARC-III subsample (n = 1) • University (n = 3) comparing the following: (1) Substance Use and Abuse Survey (SUAS) with medical chart [34] and (2) CUDIT-R with DSM-T diagnostic severity levels [45].Another study compared the CUAD-derived DSM-III-R substance use disorder diagnoses with the chart diagnosis determined by the unit psychiatrists for validation [36].Furthermore, two studies validated their measures by comparing with diagnostic standards: (1) Compared Cut down, Annoyed, Guilty, and Eye-Opener Substance Abuse Screening Tool (CAGE) with SCID-generated drug use disorder diagnoses as the standard [24] and (2) Compared the Cannabis Use Disorders Identification Test Revised (CUDIT-R) with ICD-10 dependence diagnosis [39].Three studies conducted validity testing by comparing with laboratory test results, including urine test [31,58,59] and saliva drug testing [38].
Four studies conducted validity testing by comparing other severity scales: (1) Criterion validity testing by comparing the Marijuana Screening Inventory (MSI-X) with three different severity rating scales and selected variables [18], (2) construct validity testing by comparing Personality Assessment Inventory Drug Problem Scale (PAI DRG) with ASI drug composite scores and severity ratings [33], (3) construct validity testing by comparing ASI with interviewer severity ratings and composite scores [35], and (4) concurrent validity of ASI drug scale and examined 25 participants who had drug metabolites detected in a urine sample obtained during the first interview and compare this result with their self-reported use of drugs during the 30-day assessment period in ASI interview [58].
Most studies showed strong evidence of validity or had strong significant associations with other measures for comparison.Studies that compared substance use measures with physician diagnoses or medical records showed strong overall validity.For example, Rosenberg et al. conducted ROC analysis for criterion validity and concluded that DALI functioned significantly better than traditional instruments for substance use disorders among psychiatric patients [43].Compared with DAST-10, Short Inventory of Problems-Drug Use (SIP-DU) showed 100% sensitivity and 73.5% specificity for the detection for a drug use disorder.It was less sensitive at detecting selfreported current drug use (92.9%) and drug use detected by oral fluid testing or self-report (84.7%) [49].However, studies demonstrated lack of validity for certain measures.For example: (1) Compared to urine screens, the ASI's questions about drug use in the past 30 days had poor concurrent validity, which suggested that the ASI has limited validity [59].(2) Correlations were not statistically significant among South Shore Problem Inventory-revised (SSPI) subscales and three other substance abuse indices, such as self-related substance abuse (SRSA), quantityfrequency index (QFI) for alcohol consumption, and one-item index measuring the frequency of marijuana use [40].(3) Compared with oral fluid test results, using SIP-DU at a cut-off score (to be considered a positive test for alcohol screening) showed lower sensitivity and higher specificity for detecting current drug use [48].

Discussion
This systematic review found 46 studies conducted in the US between 1979 and 2021 that tested the validity of substance use/SUD measures.Two studies were population based [20,28], while the rest were conducted in subpopulations or in clinical settings.Criterion validity and construct validity were the commonly used validation methods, and sensitivity and specificity were the most common statistical analyses for validation.More importantly, this review found that a myriad of survey measures was used to measure substance use/SUD.In addition, diverse methodologies were applied to measure validity, which makes comparability difficult.In general, most studies showed evidence of strong validity.For example, among those articles included in this review, 46 studies tested the psychometric properties of 43 different substance use screening measures.Of them, 16 tested the validity of psychometric properties by comparing other self-reported survey measures, and one study conducted criterion validity by comparing different racial or ethnic groups of offenders [25].Fourteen studies conducted concurrent validity by comparing measures with an external independent source or "gold standard, " such as physician/clinician diagnosis, medical records or assessment, severity scales, or urine/saliva drug testing.Frequently, researchers rely on self-reported information on substance use to save time and cost and collect required information on a larger sample size than making comparison with a gold standard, such as a biological test or medical record.
The measures used in these studies varied greatly.The ASI, which was used most frequently in this review, was used in only five studies.Additionally, three articles specifically conducted validity testing for marijuana use.However, each of those studies used many diverse measures, such as MSI-X [18], the CUDIT-R [39,45], screen of drug use (SoDU) [60], NESARC [60], a two-item brief screen with no instrument name reported [60], and oneitem index measuring the frequency of marijuana use [40].Multiple measures for one specific substance use might increase the likelihood of conflicting results, which can make it difficult to interpret and compare results across different studies.Thus, there is a need to adopt a standardized measure to ensure the results obtained are reliable and to be able to draw general conclusions.
In addition to the diverse measures, even the validation methods employed in the articles varied greatly.Although criterion and construct validity were the most commonly utilized validity measures, the specific type of criterion or construct validity varied among studies.For example, concurrent, predictive, and specification validity were reported as the three different types of criterion validity.Some studies employed multiple validation methods for a single survey measure, while others only used one.Moreover, different types of validity may achieve different objectives, which could explain the differences in statistical analyses of validation.This review also suggested that the statistical analyses used to test the validity of survey measures were diverse, with sensitivity and specificity being the most frequent analysis.Other statistical analyses such as ROC curve and correlation coefficient were also used to validate the survey measures.
Likewise, other differences were observed for demographic characteristics of participants.First, the validation of the substance use and SUD measures was primarily conducted in either inpatient or outpatient clinical settings, and only two studies were population based.Secondly, some studies had small sample sizes, which could significantly reduce the statistical power for finding differences between study groups.Moreover, some studies were occasionally limited to certain age or race/ethnicity groups, which could adversely affect the generalizability of findings.For example, several studies were restricted to White or Black/AA participants [6,18,21,24,25,31,32,36,39,41,58,59].In addition, information on race/ethnicity was missing from a few studies [19,22,27,34,55,57].Those studies might reflect racial disparities in SUD, as well as treatment for SUD.Although SUD is prevalent among all racial groups, the burden of disease is disproportionate among Black people, and treatment of SUD is less available for Black people [61].Three studies were limited to either males or females only [22,41,58].These studies provide valuable validation in the respective populations and may prove useful in other populations.However, further validation is needed in diverse populations for these measures to be generalizable.
SUD often co-occurs with many other physical and mental health conditions.Previous studies have shown a high co-occurrence and the increased risk of mental health disorders among individuals with SUD, which can be observed in clinical samples [62,63].In this review, only five studies validated measures for both substance use and mental health disorders.Results from studies assessing substance use and mental health simultaneously can help inform integrated treatment interventions by connecting individuals with additional service providers who can provide specialized services to treat the physical and emotional elements of mental health and SUD [64].Additional advantages of assessing co-occurring substance use and mental health include decreased hospitalization, fewer arrests, and increased housing stability [64].More importantly, assessing co-occurring substance use and mental health disorders in population research can identify the barriers and disparities of treatment access, including race/ ethnicity [65] and low treatment utilization among individuals with only substance use or only mental health disorders [66,67].
Although this review adhered to the PRISMA guidelines, it is not without limitations.It was limited to studies conducted in the US, and studies in other countries were not included.Research shows that significant contextual differences, such as burden of substance use disorders, cultural norms, legal frameworks, healthcare systems, and societal attitudes towards substance use, can vary widely across countries, potentially influencing the reliability and applicability of measures developed and validated in one context when applied to another [1][2][3].Our focus on US-based studies aims to ensure that the measures reviewed are relevant and applicable to the US population, providing a more accurate and contextspecific assessment of substance use and SUDs.
Although a rigorous search strategy was implemented, our search was limited to library databases.As such, key clinical surveys were used in hospitals or other specialty clinical settings that were not published in peer-reviewed journals and may be missing from our review.Additionally, our objectives were to summarize the validity of measures to assess the prevalence of substance use and SUD in the US estimated in population and sub-population-based surveys.Therefore, we did not specifically review the best clinical practices for survey administration in the clinical setting.Findings highlight the need to evaluate substance use surveys in a population-based setting to identify a valid survey for use across population-based surveys.The consistent use of one survey may provide for more accurate comparisons across populations.However, the main limitation of this review is that the articles included in this review are missing information about demographic characteristics, such as the distribution of race and ethnicity groups in the study population, and only 5 studies in this review reported education level of the participants [33, 37-39, 49, 58].The variation in the accuracy of self-reported data about substance use depends on education and socioeconomic status [68].The majority of studies included in this review did not report the response rate or the survey duration.Lastly, our analyses relied only on peerreview studies, and our review did not include internal studies that may have been conducted in large surveys, such as NSDUH.
This study has several strengths.To our knowledge, it is the first systematic review to summarize the validity of substance use/SUD measures used in questionnaires or instruments among US adults.This review has included 43 years of data among nine different literature databases.In addition, it has also included "gray literature" such as theses and Google Scholar, which can make significant contributions to systematic reviews by minimizing publication bias, enabling a more impartial assessment of the evidence, and publicizing null or negative findings [69].Another strength of the study is that the methodologic quality of validation studies was assessed by an adapted risk-of-bias tool, created especially for this assessment.Lastly, while previous reviews have explored the instruments used to assess substance use and the identification of disorders [7,8], this review uniquely concentrates on a comprehensive evaluation of the psychometric properties of measures assessing a broader spectrum of substances.This review aimed to distinguish from previous research, highlighting the diversity and specificity of instruments in current use, their applicability in various population and sub-population surveys, and the critical need for standardized, short, and versatile measures.
The findings of this review have several key implications.The study demonstrates that survey questions can be used to assess the prevalence of SUD in specific populations.However, most studies used different measures suggesting there was no consensus on the best measure to use for assessing the prevalence of substance use and SUD.This lack of common measures illustrates the difficulty in assessing SUD in short surveys, especially for specific substances.Similar to a global measure of psychological distress that is used to indicate nonspecific psychological distress [70], a measure is needed for measuring SUD in populationbased studies.Only 5 out of 46 studies were conducted in population or sub-population-based settings.Therefore, more research needs to be conducted to validate these measures in population-based settings to confirm their sensitivity and specificity.Additionally, more studies need to validate measures using a "gold standard, " such as an outside reliable measure, because comparing with self-reported substance use can result in misclassification bias.Therefore, this systematic review illustrates a critical need to develop short measures for assessing SUD that do not require lengthy, timeconsuming data collection that would be difficult to incorporate into population-based surveys assessing a multitude of health dimensions.

Conclusion
This systematic review summarized the validity of measures used to assess the prevalence of substance use and SUD in the US estimated in general population surveys and other population-based settings.Among the 46 studies included, this review demonstrated that a myriad of survey measures were used to assess substance use and SUD, and diverse methodologies were used to measure validity.This information suggests a lack of standardized, comparative survey measures in assessing the prevalence of substance use and SUD among US adults.This inconsistency makes it difficult to recommend the best measures to use in US surveys and highlights the need to develop better summary measures.Very few studies in this review were conducted in general population settings, which suggests that more research is needed to validate substance use measures in such settings.Although SUD is prevalent among all racial/ethnicity, age, and gender/sex groups in the US, and studies in this review provided valuable validation in the respective populations, further validation is needed in diverse populations.Thus, future validation research needs to be conducted in population-based settings to adopt a standardized measure for substance use and SUD that can inform interventions aimed to detect and manage problems associated with substance use and SUD and prevent avoidable premature US deaths.

Fig. 2
Fig. 2 Frequency of survey measures used in included studies.Abbreviations in order: Texas Christian University Drug Screen (TCUDS), Substance Use and Abuse Survey (SUAS), the Simple Screening Instrument for Substance Abuse (SSI-SA), the Simple Screening Instrument (SSI), screen of drug use (SoDU), single-item screening questions (SISQs), Substance Dependence Severity Scale (SDSS), Substance Abuse Subtle Screening Inventory-2 (SASSI-2), Rapid Opioid Dependence Screen (RODS), Personality Assessment Inventory Drug Problem Scale (PAI DRG), National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), the Marijuana Screening Inventory (MSI-X), the Longitudinal Substance Use Recall Instrument Recall for 12 Weeks instrument (LSUR-12), the Longitudinal Substance Use Recall Instrument (LSUR), Lifetime Severity Index for Cocaine Use Disorder (LSI-Cocaine), Healthcare Effectiveness Data and Information Set (HEDIS), the Drug Use Screening Inventory (DUSI), Dartmouth Assessment of Lifestyle Instrument (DALI), Cut down, Annoyed, Guilty, and Eye-Opener Substance Abuse Screening Tool (CAGE), the Alcohol Use Disorder and Associated Disabilities Interview Schedule (AUDADIS), Alcohol, Smoking, and Substance Involvement Screening Test-Drug (ASSIST-Drug), Parents, Partners, Past, and Pregnancy Plus (4P's Plus), tobacco, alcohol, prescription medication, and other substance use (TAPS tool), Substance Use Brief Screen (SUBS), single question used from short inventory of problems-drug use (SIP-DU), Drug Abuse Screening Test (DAST), the Chemical Use, Abuse, and Dependence (CUAD), Addiction Severity Index (ASI)

First author and publication year Participant's characteristics Study characteristics Survey instrument/ questionnaire characteristics Validation methods Key findings ROB assessment
Characteristics of included studies of validation testing

Table 2
Participant recruitment strategies